今天進度 : 鳥哥的 Linux 私房菜 -- Linux 基礎
另外今天發現一個很酷的工具
ericchiang/pup: Parsing HTML at the command line
需要先安裝好 go 環境
test@test:~$ sudo apt install golang-go
test@test:~$ sudo go get github.com/ericchiang/pup
可以跟 curl 串聯使用,像 js selector 一樣抓取網頁資料
# this fetches the nytimes homepage
curl -L http://www.nytimes.com
# Try it again, but pipe it into pup and select for headlines
curl -s -L http://www.nytimes.com | pup 'h2.story-heading a text{}'
# Let's get the URLs for those headlines
# We want to extract the 'href' attribute:
curl -s -L http://www.nytimes.com | pup 'h2.story-heading a attr{href}'